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Abstract 

What  we  believe  images  are  determines  how  we  take  actions  in  image  and  low- 
level  vision  analysis.  In  the  Bayesian  framework,  it  is  known  as  the  importance 
of  a  good  image  prior  model.  This  paper  intends  to  give  a  concise  overview  on 
the  vision  foundation,  mathematical  theory,  computational  algorithms,  and  various 
classical  as  well  as  unexpected  new  applications  of  the  BV  (bounded  variation) 
image  model,  first  introduced  into  image  processing  by  Rudin,  Osher,  and  Fatemi 
in  1992  [Physica  D,  60:259-268]. 
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1  Introduction:  Image  modeling 

Image  modeling,  namely,  finding  a  suitable  way  to  describe  and  represent  images,  is 
perhaps  the  most  fundamental  and  crucial  step  for  the  whole  ladder  of  tasks  in  image 
and  low-level  vision  analysis.  The  underlying  philosophy  is:  the  way  we  process  and 
analyze  images  depends  very  much  on  what  we  believe  (or  model )  they  are. 

A  household  analogy  would  be  the  prediction  of  the  stock  price.  If  it  is  believed 
to  be  a  smooth  function  of  dates,  then  tomorrow’s  price  can  be  well  predicted  by  those 
of  today  and  yesterday  by  simple  polynomial  interpolations.  But  if  more  realistically, 
the  price  stream  is  modeled  as  a  stochastic  process,  then  the  prediction  has  to  be  based 
on  the  features  of  such  process.  The  interaction  between  image  processing  and  image 
modeling  is  very  much  the  same  story. 

As  of  February,  2002,  the  Google  search  engine  returns  about  50,000,000  docu¬ 
ments  containing  the  word  “image.”  But  this  broad  usage  does  not  mean  that  we  have 
already  had  a  rigorous  mathematical  definition.  In  fact,  even  the  Webster’s  Dictio¬ 
nary  kicks  the  definition  of  “image”  onto  that  of  “picture,”  and  then  explains  the  latter 
vaguely  as  “a  representation  made  by  painting,  drawing,  or  photography,”  which  says 
nothing  but  only  how  “images”  or  “pictures”  are  formed.  It  reminds  us  the  concept  of 
weight.  Mankind  had  blindly  used  it  for  thousands  of  years  until  the  giants  Newton  and 
Einstein  first  tried  to  decipher  the  meaning  of  gravity. 

The  mathematical  challenge  of  image  modeling  roots  in  the  diversity  and  complex¬ 
ity  of  images,  from  the  rich  geometric  structures  to  a  large  dynamic  range  of  scales. 
Most  of  us  do  not  consider  it  a  good  idea  to  lazily  vote  for  any  function  u(x,  y)  (equally) 
as  an  image.  But  it  seems  that  no  one  has  yet  seized  the  right  tool  to  characterize  the 
boundary  between  images  and  non-images.  Perhaps  there  is  no  such  sharp  boundary 
at  all.  That  is  expressed  by  the  well  known  Gibbs  ’  Random  Fields  model  of  Geman 
and  Geman  [GG84].  Based  on  filtering  and  statistical  learning,  the  model  has  been  de¬ 
veloped  more  generally  by  Zhu,  Wu,  and  Mumford  [ZWM97,  ZM97].  Such  stochastic 
approach  for  image  modeling  gets  more  theoretically  matured  in  the  very  recent  work 
of  Mumford  and  Gidas  [MG01]  based  on  infinitely  divisible  law  and  axiomatization. 

Away  from  the  stochastic  theory  of  images,  is  the  exploration  of  possible  deter¬ 
ministic  image  models.  Such  transition  is  perhaps  best  described  by  Yves  Meyer’s 
very  recent  “u  +  v”  notion  [MeyOl],  Here  v(x,y)  represents  the  rapidly  oscillatory 
component  (noise  or  textures),  or  a  stochastic  sampling,  while  u(x,y )  captures  the 
deterministic  features. 

In  the  very  beginning  of  computer  vision  and  artificial  intelligence,  Marr  and  his 
colleagues  [MH80]  already  noticed  the  importance  of  edges  for  image  understanding 
and  visual  communication.  Edges  are  indeed  an  intrinsic  feature  for  images  since  they 
define,  segment,  and  correlate  individual  objects  [NMS93].  Thus  the  deterministic 
component  u  should  at  least  allow  edges,  or,  one  dimensional  singularities,  and  cannot 
be  a  traditional  Sobolev  function.  Mumford  and  Shah  [MS89]  singled  out  these  edge 
features  and  proposed  the  famous  object-edge  free  boundary  image  model.  Recently 
Donoho  and  his  students  have  developed  geometric  wavelets  such  as  curvelets  to  model 
the  component  u  (while  leaving  the  oscillatory  component  v  resonant  with  conventional 
wavelets)  [DonOO], 

Is  there  a  simple  linear  functional  space  that  legalizes  edges  and  is  easy  to  work 
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with,  but  is  not  too  loose  to  include  too  many  “uninteresting”  images.  The  answer 
was  discovered  by  Rudin,  Osher,  and  Fatemi  [ROF92,  R094]  in  1992.  It  is  the  Ba¬ 
nach  space  of  functions  with  bounded  variations  (BV).  Ever  since,  the  model  has  wit¬ 
nessed  many  applications  in  image  denoising,  deblurring,  interpolation  and  inpainting, 
super-resolution  and  zooming,  error  concealment  in  wireless  image  transmission,  med¬ 
ical  imaging,  and  various  inverse  problems  (see,  for  examples,  [ROF92,  DS96,  V096, 
CW98,  CSOld,  COSOl,  CSOla]). 

The  current  paper  attempts  to  give  an  overview  on  the  theory  and  applications  of 
Rudin,  Osher,  and  Fatemi’s  BV  image  model  for  image  restoration,  with  a  special 
emphasis  on  our  recent  work  of  employing  the  B  V  image  model  as  an  image  interpolant 
for  the  inpainting  problem  [CSOla,  CSOlb,  CSOlc,  CS02], 

Section  II  briefly  lays  out  the  Bayesian  foundation  for  variational  image  restora¬ 
tion.  Section  III  introduces  the  original  Rudin-Osher-Fatemi  TV  restoration  model. 
We  present  both  its  theory  and  computation.  In  section  IV,  we  explain  our  recent  effort 
on  applying  the  BV  image  model  as  an  efficient  image  interpolant  for  the  inpainting 
problem,  highlighted  with  several  computational  examples.  The  last  section  discusses 
two  extra  issues  related  to  the  BV  image  model  and  then  concludes  the  paper. 


2  Bayesian  Framework  for  Image  Restoration 


If  there  indeed  exists  the  most  important  principle  in  the  entire  field  of  image  and  vision 
analysis,  it  has  to  be  the  Bayesian  rule. 

Many  problems  in  image  and  vision  analysis  can  be  set  up  as  follows.  We  are  to 
infer  some  feature  or  patten  (vector  or  continuous  field)  F  from  a  given  measured  or 
observed  data  field  AV  For  example,  for  image  restoration,  A”q  corresponds  to  a  given 
corrupted  image  r/0,  which  is  often  snowed  by  noise,  blurred  by  de-focusing  or  medium 
scattering,  or  has  certain  data  missing  during  the  transmission  process;  and  F  denotes 
the  ideal  image  u  one  would  get  without  all  those  distortion  effects.  For  vision  analysis, 
,Y(I  may  represent  the  2-D  image  u,  while  F  denotes  the  3-D  configuration  parameters 
(illuminance  and  reflectance,  etc.)  [Ker]. 

The  ideal  inference  of  F  from  .Y0  is  naturally  the  one  that  maximizes  the  posterior 
probability  Prob(F1|A^o).  According  to  the  Bayes  formula 


Prob(F\X0) 


Prob(A0| F)  Prob (F) 
Prob(Ao) 


it  suffices  to  maximize  the  product  of  the  data  model  Prob(A^o \F)  and  the  prior  model 
Prob {F),  since  the  denominator  is  merely  a  normalization  constant  once  .A 0  is  given. 
The  prior  model  Profit  F)  specifies  how  often  a  pattern  F  can  be  observed  a  priori , 
i.e.,  independent  of  any  observation  made.  The  data  model  Prob(.Ao|  F)  then  reveals 
the  likelihood  for  .Y()  being  generated  from  a  given  pattern  F. 

If  one  has  the  a  priori  evidence  for  the  importance  of  geometric  structures  in  the 
pattern  distribution  Prob(F)  (such  as  edges  and  their  geometry  for  image  understand¬ 
ing),  then  it  is  more  convenient  to  work  with  the  “energy”  form  of  the  Bayesian  method, 
as  Mumford  did  for  various  segmentation  models  [Mum94],  This  is  at  least  formally 
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achieved  via  Gibbs’  formula  in  statistical  mechanics  [Gib02]:  the  likelihood  for  a  con¬ 
figuration  F  being  observed  is  associated  to  its  energy  E[F\  by 

Prob(F)  =  iexp(-fEl). 

where  k  and  T  denote  the  Boltzmann  constant  and  absolute  temperature,  and  Z  the 
partition  function  over  all  the  permissible  configurations.  The  meaning  of  the  energy 
E[Xq 1 F]  is  similarly  defined,  though  lacking  a  rigorous  counterpart  in  statistical  me¬ 
chanics.  Therefore,  the  Bayesian  method  leads  to  the  energy  minimization  problem 

min  E[F]  +  E[X0\F]. 

F 

In  the  literature  of  deterministic  inverse  problems,  this  corresponds  to  the  celebrated 
idea  of  Tikhonov  regularization  [Tik63],  The  Bayesian  approach  is  more  general  in 
many  aspects. 

In  terms  of  image  restoration  uq  — >  u,  we  are  to  minimize 

E[u\  +  E[uo\u\.  (1) 

The  data  model  u  —>■  uq  depends  on  the  real  physical  imaging  process.  One  popular 
and  useful  choice  as  in  astronomic  and  many  medical  imaging  processes  is  blurring 
followed  by  noising  [ROF92,  CW98]: 

Uq  =  Ku  +  n, 

where  n  denotes  additive  noise,  and  the  linear  operator  K  models  the  blurring  process 

Ku(x)=  /  K(x,y)u(y)dy. 

Jn 

K  is  lowpass  in  the  sense  that  K 1  =  1.  As  well  known  in  signal  processing,  if  K 
is  shift-invariant  (or  spatially  homogeneous),  then  it  has  to  be  an  ordinary  filter  h(x)* 
via  the  convolution  operator:  K(x,  y)  =  h{x  —  y).  Although  realistic  blurring  factors 
fluctuate  randomly,  most  often  we  observe  that  K  is  fixed  deterministically.  We  shall 
do  so  in  this  paper  as  well.  Modeling  the  white  noise  by  Gaussian,  we  easily  obtain  the 
energy  for  the  data  model  (up  to  a  multiplier): 

.  |  ,  f  (Ku(x)  —  Uq(x))2 

E[uq\u]  =  /  - - —dx,  (2) 

Jn  crW'G 

where  ct2(x)  is  the  noise  variance  at  pixel  x,  and  is  a  constant  for  homogeneous 
noise.  Generally  l/a'2{x)  simply  contributes  as  a  positive  weight  w{x),  and  the  en¬ 
ergy  presents  a  weighted  least  square  fitting  as  discussed  in  Strang  [Str93], 

Other  blurring  and  noising  models  are  also  possible  depending  on  the  real  imaging 
processes.  For  example,  Rudin  and  Osher  also  studied  the  multiplicative  noise  model 
in  |  R094] . 

Therefore,  the  restoration  quality  by  (1)  crucially  depends  on  the  choice  of  the 
image  model  E[u\.  The  BV  image  model  of  Rudin-Osher-Fatemi  [ROF92]  captures 
the  edge  feature  of  images,  and  is  perhaps  the  most  efficient  geometric  image  model  in 
terms  of  theoretical  accessibility,  computational  efficiency,  and  applicational  quality. 
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3  The  BV  Model  of  Rudin,  Osher,  and  Fatemi 

3.1  Functions  with  bounded  variation 

We  start  with  some  essential  mathematical  theory  on  functions  with  bounded  varia¬ 
tions.  We  refer  to  the  outstanding  monograph  by  Giusti  [Giu84]  for  more  details. 

Let  fl  C  R2  denote  the  open  image  domain,  which  for  most  real  applications  bears 
a  rectangular  shape.  For  each  real  function  /  £  L11oc(fl)  (i.e.,  locally  integrable),  its 
total  variation  TV(/)  is  defined  in  the  distributional  sense: 


TV (/)  =  sup  /  /(V  •  g)  dx,  (3) 

gecl(Q,B2)  Jn 

where  Bn  denotes  the  unit  disk  in  R2,  and  the  space  of  test  functions  is 

Cq( fl,  Bn)  =  {all  C 1  maps  g  :  fl  — »■  Bn ,  which  are  compactly  supported}. 

Since  /  €  Lloc(Q)  and  V  •  g  £  Coity,  TV(/)  is  well  defined.  TV(/)  >  0  since 
Bn  is  closed  under  reflection  x  =  {x\,xn)  —t  —x.  Suppose  /  is  in  the  Sobolev  space 
IV1,1  (fl),  then  V/  G  L^fl)  and 

-  /  /V  ■  g  dx  =  I  (V/)  •  g  dx, 

Jn  Jn 

which  immediately  implies  that 

TV(/)  =  £  |  V  / 1  dx  =  £  s/fl+fl  dXldx2. 

It  is  for  this  reason  that  in  BV  theory,  TV  (/)  is  also  denoted  by  /  I)  / 1 ,  with  the  sym- 

Jn 

bol  D  reminding  the  conventional  differentiation  V,  and  the  absence  of  the  Lebesgue 
area  element  dx  indicating  that  \Df\  is  a  general  Radon  measure. 

The  Rudin-Osher-Fatemi  image  model  takes  E[u]  =  TV (u)  from  the  previous 
section  [ROF92,  R094],  and  assumes  that  all  observable  images  have  finite  variation. 
The  space  of  bounded  variation  is  defined  as 

BV(fl)  =  {/  :  /  G  L^Q)  and  TV(/)  <  oo}. 

It  can  be  easily  shown  that  BV(ff)  is  a  Banach  space  under  the  BV  norm 

II/IIbv  =  II/IIli  +  tv(/), 

and  it  is  continuously  embedded  in  L1  (fl). 

Among  all  the  important  properties,  there  are  three  ones  that  have  helped  Rudin- 
Osher-Fatemi’s  BV  image  model  become  easily  accessible  in  theory,  and  meaningful 
for  applications  in  image  and  low-level  vision  analysis. 

Lower  semi-continuity  of  the  TV  norm  in  L1  (fl)  says  if  /„  — >■  /  weakly  in  L1  (fl), 
then 

/  \Df  \  <  liminf  /  \Dfn\. 

Jn  n^°°  Jn 
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In  addition,  the  embedding  BV(H)  — »■  L  1  (il)  is  compact,  i.e.,  the  unit  ball  of  BV(fl) 
is  compact  in  L1  (Q).  As  well  practiced  in  the  direct  method  in  Calculus  of  Variations, 
these  two  key  properties  together  often  point  to  the  existence  of  minimizers  for  energies 
involving  the  TV  norm.  This  is  indeed  the  case  in  Chambolle  and  Lions’  work  on  the 
TV  restoration  model  [CL97],  which  will  be  outlined  in  the  next  section. 

The  third  property  reveals  the  geometric  nature  of  the  TV  norm,  and  thus  strongly 
supports  its  application  in  geometry  motivated  vision  and  image  analysis.  It  is  the 
co-area  formula.  Define  the  perimeter  Per(Q)  of  a  domain  Q  C  0  to  he 

Per(Q)  =  /  \DlQ(x)\  =  TV(lg), 

J  Q 

which  generalizes  the  conventional  notion  of  length  for  a  regular  boundary  dQ.  For 
any  function  u  G  BV(fl),  the  co-area  formula  says 

n  roo 

/  \Du\  =  /  Per(u  <  A)  dX.  (4) 

JQ  J —oo 

Here  the  event  (u  <  A)  denotes  the  domain  Q \  =  {x  G  fl  :  u(x)  <  A}. 

To  better  understand  the  formula,  imagine  a  simple  case  when  u  G  C°°  and  A  is 
regular,  i.e.,  Vu(x)  does  not  vanish  on  the  entire  level  set  u  =  A.  Then  the  boundary 
dQ\  is  a  regular  smooth  curve  and  Per(Q>,)  is  exactly  its  Euclidean  length.  Therefore, 
in  the  conventional  sense,  the  co-area  formula  states  that 

TV(tt)  is  a  collective  way  to  sum  up  the  lengths  of  all  level  lines. 

It  is  this  property  that  brings  the  TV  norm  closer  to  meeting  the  requirement  of  an 
ideal  vision  measure.  Generally,  human  vision  tends  to  represent  curves  and  edges  as 
simple  as  possible  for  the  purpose  of  efficient  neuronal  data  compression  and  visual 
communication  [DonOO],  Such  representation  is  achieved  by  having  the  local  small 
ripples  ignored  or  filtered  out,  just  as  having  the  curve  lengths  shortened.  This  is  the 
vision  rationale  for  the  minimization  of  the  TV  norm  and  the  B  V  image  model. 

3.2  TV  restoration:  Model  and  theory 

Section  2  and  3.1  lay  out  the  vision  and  mathematical  foundations  for  the  original 
restoration  model  of  Rudin,  Osher,  and  Fate  mi  [ROF92,  R094]. 

As  in  Section  2,  assume  that  a  given  image  uq  is  noisy  and  blurred: 

uo  =  Ku  +  n, 

and  in  addition,  the  ideal  image  u  is  assumed  in  BV(fi).  Then  the  Bayesian  restoration 
energy  first  proposed  by  Rudin,  Osher,  and  Fatemi  [ROF92,  R094]  is 

E[u\u0]  =  TV(u)  +  E[u0H 

where  the  data  model  F[w,oM 's  as  giyen  in  (2).  More  explicitly,  we  are  to  minimize 

E\u\uq\=  /  \Du\  +  /  (Ku  —  Uo)2w{x)dx,  (5) 

Jn  Jq 
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with  w{x)  =  const./ a2  (x).  In  the  original  paper  [ROF92],  the  noise  is  assumed  to  be 
homogeneous,  and  thus  w(x)  =  A/2  is  a  constant  weight,  with  A  taking  the  effect  of  a 
Lagrange  multiplier. 

The  existence  and  uniqueness  of  the  TV  restoration  model  in  BV(fi)  fj  Ir(Sl)  were 
proven  by  Chambolle  and  Lions  using  the  direct  method  [CL97].  The  major  properties 
leading  to  the  proof  are  the  lower  semi-continuity  and  L1  compactness  as  outlined  in 
the  previous  section.  The  basic  assumptions  ensuring  the  existence  and  uniqueness  are 

(a)  (blurring  model)  The  linear  blurring  operator  A'  :  L2(fl)  — >  L2(fl)  is  continuous, 
lowpass :  K1  =  1,  and  injective  (for  uniqueness). 

(b)  (noise  model)  The  noise  n  has  mean  0  and  variance  a  2 ,  known  a  priori. 

(c)  (independence  of  blurring  and  noise)  Var(u  o)  >  o2 . 

We  should  say  a  few  more  words  about  the  last  condition.  In  the  current  data  model, 
we  have  assumed  that  the  blurring  K  and  noise  n  are  independent.  Therefore,  from 
probability, 

Var(w0)  =  Var (Ku)  +  Var(n)  =  o2  +  Var(A'w)  >  a2 . 


(If  both  the  blurring  K  and  the  ideal  image  u  are  deterministic,  then  equality  is  indeed 
achieved.)  But  from  the  application  point  of  view,  most  often  we  are  only  given  one 
single  observation  uq,  despite  that  w0  is  a  random  field.  Therefore,  the  last  condition  is 
numerically  understood  and  inspected  in  the  ergodic  sense: 


Var  ( wo) 


dx, 


where  |fi|  denotes  the  Lebesgue  measure  of  the  image  domain. 

Recently,  Chan,  Osher,  and  Shen  has  extended  the  TV  restoration  model  (5)  to  data 
that  live  on  general  graphs  (the  so-called  digital  TV ),  and  to  “non-flat”  data  or  image 
features  (such  as  chromaticity  and  orientations  of  optical  flows)  that  live  on  Riemannian 
manifolds  [COSOl,  CSOld], 


3.3  TV  restoration:  Computation  and  approximation 

To  computationally  realize  the  TV  restoration  model  (5),  as  first  proposed  by  Rudin, 
Osher,  and  Fatemi  [R094],  one  typically  takes  the  steepest  descent  method  (time  march¬ 
ing)  or  directly  solves  the  associated  equilibrium  equation  (steady  solution)  by  iterative 
methods  [V096,  DV97].  Here  we  discuss  the  latter. 

Formally,  or  assuming  u  in  a  finer  space  H  1  iT>),  we  find  that  the  equilibrium  equa¬ 
tion  for  energy  (5)  is  given  by 

V  •  -  2K*w{Ku  -  u0)  =  0,  (6) 

with  the  Neumann  adiabatic  boundary  condition.  Here  K  *  is  the  adjoint  of  K,  and 
w(x)  =  A/ 2  corresponds  to  the  homogeneity  of  the  noise  in  the  original  Rudin-Osher- 
Fatemi  model.  If  indeed  u  G  H1(fl).  then  the  differential  equation  is  understood  in  the 
weak  sense  as  in  the  classical  theory  of  elliptic  equations  (i.e.,  in  {H1)' ). 
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For  a  smooth  function  u,  the  differential  term  in  (6)  at  a  regular  (i.e.  with  non¬ 
degenerate  gradient)  pixel  xo  is  exactly  the  curvature  of  the  level  line  u  =  u(x o), 
which  once  more  reveals  the  geometry  encoded  into  the  model. 

All  the  existing  numerical  algorithms  have  depended  on  some  form  of  relaxation 
of  the  TV  norm.  That  is. 


to  replace 


dx. 


Here  <fie(p)  :  R  — >  R,  with  <f>'€(p)  >  0,  forp  >  0  is  a  C1  convex  even  function  that 
mollifies  that  original  \p\,  and  e  is  a  small  relaxation  parameter  that  often  models  the 
sensitivity  of  a  vision  system.  Popular  choices  include 


</>e(p)  =  \Jv 2  +  e2  :=  \p\e, 

and  the  integral  of  fe(p)  =  e  V  p-1  A  e-1  with  cpe( 0)  =  0.  (Here  the  wedge  notations 
represent  the  ceiling  and  grounding  operators  [CL97])  Consequently,  the  equilibrium 
equation  for  such  a  relaxation  is  modified  to 


V  •  Vu)  -  I<*w(Ku  -  u0 )  =  0,  (7) 

It  can  be  shown  that  the  solutions  to  these  two  relaxed  problems  are  all  in  H 1  (fl). 

Computationally,  the  nonlinear  equation  (6)  is  often  solved  iteratively  by  the  freez¬ 
ing  technique.  That  is,  at  each  step  n,  the  next  update  u (,)+l  1  solves  the  linearized 
Poisson  equation  with  blurring  and  fitting: 

V  •  (Vn)  Vu i)  -  K*w(I<u  -Uo)  =  0,  (8) 

where  %  =  cp'e(\V u\) / \V u\  >  0  is  the  diffusivity  coefficient,  and  is  freezed  at  the  cur¬ 
rent  step.  Therefore,  from  the  energy  point  of  view,  the  update  is  the  unique  minimizer 
to  the  elliptic  energy 

/  zin)\Vu\'2dx+  I  (Ku  —  u°)2w(x)dx 

Jn  Jn 

in  H1(fl).  The  convergence  of  such  algorithms  has  been  confirmed  in  [DV97,  CL97]. 
Furthermore,  it  is  even  possible  to  endow  an  energy  meaning  to  the  updating  of  z  itself: 
An+i)  (./  (  V (/'  '*  * 1 ;  ) /  V (/*"  r  1  .  For  instance,  for  the  choice  of  o,  (p)  =  /> ,  ,  it  is 
easy  to  show  that  z +  l  1  is  the  minimizer  of 


(n+D|2 


dx. 


(For  the  other  example,  see  Chambolle  and  Lions  [CL97].)  Therefore,  in  this  case,  the 
iterative  algorithm  based  on  the  freezing  technique  is  essentially  to  minimize 


Unlike  the  original  Rudin-Osher-Fatemi  model  [ROF92,  R094],  it  now  contains  a  new 
feature  z,  which  is  often  called  the  auxiliary  variable  in  the  vision  community  [GR92], 
We  also  call  v  the  edge  signature  since  if  z  is  plotted  as  an  image,  then  the  dark  fi.e., 
small  z)  and  thin  (depending  on  e)  stripes  clearly  outline  the  edges  in  the  image. 

4  BV  as  an  Interpolant:  Image  Inpainting 

4.1  The  problem  of  inpainting 

The  word  “inpainting”  is  an  artistic  synonym  for  “image  interpolation,”  as  initially 
circulated  among  museum  restoration  artists,  who  manually  recover  the  cracks  of  de¬ 
graded  ancient  paintings  by  following  as  faithfully  as  possible  the  intention  of  their 
original  creators. 

Recently,  the  concept  of  inpainting  has  been  connected  to  many  major  problems 
in  image  processing  and  low-level  vision,  such  as  perceptual  image  coding  and  com¬ 
pression,  and  error  concealment  for  wireless  image  transmission  [CSOla].  We  refer  to 
our  recent  survey  paper  [CS02]  for  much  more  details  on  the  status  of  the  inpainting 
problem. 

Traditionally,  image  interpolation  is  often  restricted  to  problems  with  scattered 
small-scale  missing  data.  Thus  the  approaches  and  algorithms  have  been  mostly  de¬ 
veloped  from  the  viewpoints  of  the  spectral  method,  filtering  method,  wavelets,  and 
radially  symmetric  bases,  etc..  But  for  large-scale  interpolation,  or  “inpainting,”  these 
conventional  approaches  do  not  seem  to  work  well  due  to  the  fundamental  challenge: 
how  to  faithfully  (at  least  visually  meaningfully)  recover  the  missing  edges,  i.e.,  the 
1 -dimensional  singular  feature  of  images. 

Away  from  these  classical  methods,  Bertalmio  et  al.  [BSCBOO]  has  recently  intro¬ 
duced  the  idea  of  applying  transport  type  of  high  order  PDEs  to  complete  the  broken 
edges.  The  authors  of  the  present  paper  then  took  a  different  approach  by  having  the 
inpainting  problem  embedded  into  the  general  category  of  image  restoration  problems. 
As  a  result,  our  inpainting  models  have  been  based  on  the  Bayesian  framework  of  Sec¬ 
tion  2.  The  first  image  model  catching  our  attention  was  the  B  V  image  model  of  Rudin, 
Osher,  and  Fatemi.  This  is  the  TV  inpainting  model  that  we  first  proposed  and  studied 
in  [CSOla]. 

As  one  shall  see  in  the  next  section,  TV  inpainting  is  almost  identical  to  the  original 
TV  restoration  model  (5).  The  beauty  lies  in  that  the  slight  modification  dramatically 
extends  its  scope  of  applicability,  and  reveals  many  unexpected  connections  to  other 
important  problems  in  image  and  low-level  vision  analysis,  such  as  perceptual  image 
coding  and  super-resolution  [CSOla], 

After  the  TV  inpainting  was  first  introduced,  our  further  recent  works  have  demon¬ 
strated  that  as  a  special  restoration  problem,  inpainting  does  carry  its  own  identity  and 
important  differences  from  the  more  familiar  types  such  as  denoising  and  deblurring. 
In  [CSOlc,  CKS01,  ES02],  we  have  shown  that  the  BV  image  model  is  insufficient 
for  large  scale  inpainting  problems,  and  high  order  geometric  image  models  based  on 
curvatures  are  necessary  for  more  faithful  reconstruction  of  partially  missing  edges. 

Nevertheless,  the  BV  image  model  still  remains  to  be  the  simplest  and  effective 
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image  interpolant,  and  TV  inpainting  is  one  of  the  very  few  inpainting  models  that 
allow  both  complete  theoretical  analysis  and  efficient  computational  implementation. 
And  even  for  realistic  applications,  it  always  provides  a  valuable  lower  order  initial 
guess  for  computationally  expensive  high  order  models.  We  now  explain  TV  inpainting 
and  some  of  its  major  applications  in  digital  image  processing. 

4.2  The  TV  inpainting  model  by  Chan  and  Shen 

Let  D  C  Cl  denote  the  compact  inpainting  domain,  on  which  the  observation  u  0  1 D  is 
missing.  The  goal  of  inpainting  is  to  recover  the  ideal  image  u  on  the  entire  domain  Q 
based  on  the  available  portion  uq 

It  is  quite  obvious  that  generally  the  inpainting  problem  is  highly  ill-posed:  without 
the  input  from  high-level  vision  operators,  such  as  symmetry  detection  or  more  gen¬ 
eral  pattern  learning,  it  is  impossible  to  inpaint  an  object  that  is  completely  missing. 
However,  the  stroke  of  luck  does  shine  in  many  major  digital  applications  [CSOla] 
(e.g.,  random  packet  loss  in  wireless  image  transmission,  image  zooming  and  super¬ 
resolution,  etc.)  in  that  indeed  retains  crucial  information  about  uo\D- 

The  generative  data  model  for  inpainting  is 

u0\D  =  (Ku  +  n)\D. 

Under  the  assumptions  in  Section  2,  it  leads  to  the  energy  form 

Ed[uo\u]  =  /  (Ku  —  uo)2w{x)dx, 

Jn\D 

with  the  weight  w(x)  =  l/cr2(;r).  Let  wD(x)  denote  the  zero  extension  of  w(x) 
onto  the  whole  domain  f 2,  i.e., 

wD(x)  =  (1  —  Id  (x))w(x). 

Then  the  energy  for  the  data  model  can  also  be  written  as 

Ed\uq\v\=  /  (Ku  —  Uo)2wD (x)dx,  (9) 

Jn 

where  uq  \Q\D  is  extended  to  uo  =  uq  |  Q  in  any  manner  since  it  is  wiped  out  by  w D  (x) 
anyway. 

Then  Bayesian  inpainting  based  on  the  BV  image  model  is  to  minimize 

En[u\uo]  =  I  \Du\  +  I  (Ku  —  uo)2wD (x)  dx,  (10) 

Jn  Jn 

which  is  almost  identical  to  the  original  TV  restoration  model  of  Rudin,  Osher,  and 
Fatemi  (5),  only  with  an  adjustment  on  the  weight  function.  Consequently,  they  share 
the  same  form  of  the  PDE: 

V  •  -  K*wD (Ku  -  uo)  =  0,  (11) 
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or  for  the  sake  of  the  freezing  algorithm  mentioned  above,  L  zu  =  f  with 


/  =  K*wdu0,  L-  =  -V  •  2V  +  K*wdK,  z  =  — .  (12) 

|Vu| 

The  associated  boundary  condition  along  <90  is  again  Neumann  adiabatic. 

What  is  the  mathematical  difference  between  the  TV  restoration  models  (5,  6)  and 
the  TV  inpainting  models,  caused  by  the  almost  trivial  modification  of  the  weight  func¬ 
tion?  Unlike  the  situation  when  the  weight  function  w(x)  >  A/2  >  0  for  all  pixels, 
the  inpainting  energy  (10)  is  no  longer  strictly  convex.  Therefore,  as  shown  by  Chan, 
Kang,  and  Shen  [CKS01],  the  existence  of  TV  inpainting  in  BV(fl)  is  guaranteed,  but 
generally  uniqueness  is  not.  In  terms  of  the  iterative  algorithm  based  on  the  freezing 
technique,  the  non-uniqueness  is  caused  by  the  fact  that  the  linearized  operator  L  z  is 
only  semi-positive  definite. 

As  explained  in  our  paper  [CKS01],  the  non-uniqueness  of  TV  inpainting  may  not 
be  necessarily  a  defect  of  the  model,  but  instead,  an  intrinsic  part  of  the  inpainting 
problem  itself.  To  certain  degree,  it  models  the  uncertain  situation  of  human  deci¬ 
sion  making  when  the  given  information  is  generated  by  two  or  more  equally  possible 
patterns. 

All  the  computational  methods  discussed  in  Section  3.3  apply  here  too.  In  addition, 
to  increase  the  sparsity  of  the  linear  system  for  general  blurring  kernel  K,  we  have 
modified  the  freezing  iterative  scheme  at  each  step  n  to 

[/..;  +  (wD  -  K*wDK)]u(n+1)  =  (wD  -  K*wDI<)u(n)  +  /. 


Figures  1  and  2  display  the  computational  outputs  for  two  images  with  simulated 
digital  blurring,  noising,  and  random  packet  loss  with: 


Ki 
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(13) 


Here  the  asterisks  indicate  that  the  powers  are  in  the  convolutional  sense.  K \  and  I\ 2 
simulate  the  continuous  isotropic  blurring  and  directional  motion  blurring  separately. 
These  simulated  results  clearly  demonstrate  the  power  of  TV  inpainting  for  the  poten¬ 
tial  market  of  noisy  transmission  of  blurred  images  with  randomly  lost  packets,  images 
from  the  Hubble  telescope,  for  example  [V096,  CW98]. 

There  are  also  some  important  applications  that  cannot  be  covered  by  the  continu¬ 
ous  language,  yet  made  possible  by  the  extension  of  the  TV  norm  onto  general  graphs 
by  Chan,  Osher,  and  Shen  [COSOl].  Zooming  and  perceptual  image  coding  are  such 
examples  that  we  first  studied  in  [CSOla]. 

Let  us  discuss  a  simplified  version  of  the  digital  zoom-in  problem  and  the  TV 
inpainting  approach.  The  goal  of  zoom-in  is  to  create  a  2 N  x  2 N  digital  image 
[ttjj]  (1  <  i.j  <  2N)  from  its  possibly  noisy  coarse  sampling  [it  ^/d  </../<, V). 
Thus  the  weight  function  in  this  digital  setting  is  given  by 


wfj  =  1,  2|t  &  2| j;  0,  otherwise. 
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TV  deblurring  and  inpainting 


Figure  1 :  TV  Inpainting  of  a  blurred  image  (by  K  \  in  ( 1 3  ) )  with  simulated  random 
packet  loss. 


In  the  paper  [CSOla],  we  proposed  the  following  zoom-in  model  based  on  TV  inpaint¬ 
ing: 

H'.,,  ([</,..,-])+  X]  W/,;  (14) 

l<i,j<2N 

where  TVS  denotes  the  digital  TV  norm  on  graphs  (a  model  for  digital  domains),  as 
introduced  in  [COSOl],  Figure  3  shows  the  computational  output  of  the  model  when 
applied  to  a  test  image  from  Caltech’s  computational  vision  group.  The  comparison  has 
been  made  between  the  BV  image  model  and  the  Sobolev  one  (i.e.,  E[u]  =  fn  |  Vw|2). 
One  can  clearly  observe  that  BV  yields  much  better  reconstruction  in  terms  of  the 
sharpness  of  object  boundaries. 

The  second  non-trivial  application  connects  inpainting  to  perceptual  image  cod¬ 
ing,  compression,  and  reconstruction  [CSOla],  An  example  based  on  digital  TV  in¬ 
painting  (14)  (with  a  different  weight  wD )  is  presented  in  Figure  4.  We  refer  to  our 
paper  [CSOla]  for  more  details. 

These  examples  clearly  demonstrate  the  beauty  of  a  good  image  model  -  it  facili¬ 
tates  all  processing  tasks,  as  a  lighthouse  does  for  successful  navigation. 


5  Beyond  BV  and  Conclusion 

5.1  The  Mumford-Shah  image  model 

A  sibling  to  BV  images  is  the  celebrated  object-edge  model  of  Mumford  and  Shah  [MS89]: 

Ea[u,T]=  f  Yu  -,/./• -l-n//1  (I  ). 

Jn\  r 
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noisy  motion-blurred  image  with  missing  data 


TV  restoration  and  inpainting 


Figure  2:  TV  inpainting  of  a  noisy  blurred  image  (by  K o  in  (13))  with  simulated  ran¬ 
dom  packet  loss. 


where  H 1  (F)  is  the  one-dimensional  Hausdorff  measure  of  the  “edge”  set  F,  and  a  >  0 
is  a  fixed  weight.  It  is  easy  to  see  that  for  near-cartoon  images  (i.e.,  the  jump  set  1  is 
piecewise  smooth,  and  Vw  1  on  fl  \  T),  the  Mumford-Shah image  model  Ea[u,  T] 
is  equivalent  to  TV[w],  since  the  latter  as  a  Radon  measure  is  concentrated  along  the 
jump  set  as  well.  The  kinship  between  the  two  image  models  can  also  be  seen  from 
a  unified  viewpoint  based  on  the  edge  signature  function  2  introduced  in  Section  3.3. 
Such  an  approach  has  been  well  known  in  the  vision  community  [GR92].  In  Section 
3.3,  it  has  been  established  that  the  BV  image  model  is  approximately  (controlled  by 
e  1)  equivalent  to 


Etv[u,z]  =  J  ^z2\Vu 


2  +ez2  + 


dx. 


Here  we  have  replaced  the  original  2  by  2  2  since  it  is  positive  as  seen  from  the  freezing 
algorithm  in  Section  3.3.  On  the  other  hand,  under  the  T-convergence  approximation 
theory  (Ambrosio  and  Tortorelli  [AT90,  AT92]),  the  edge  set  T  in  the  Mumford-Shah  is 
also  replaced  by  an  edge  “signature”  function  2,  and  the  image  model  is  approximately 
equivalent  to 


Ems[u,z]  =  J  ^22|Vu|2  +a(e|V2|2  +  ^  ^  dx. 

Therefore,  by  introducing  the  edge  signature  function  2,  both  the  BV  image  model  of 
Rudin-Osher-Fatemi  and  the  object-edge  image  model  of  Mumford  and  Shah  belong 
to  the  same  class  of  coupled  energies: 


(22|Vw|2  +  ge(z,  V2,  V  O  V2))  dx, 


where  ge  is  a  suitable  function  controlled  by  a  small  parameter  e,  and  V  O  V  denotes 
the  Hessian  operator. 
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The  original  image  Zoom-out  by  a  subsampling  of  factor  4 


The  harmonic  zoom-in 


The  TV  zoom-in 


Figure  3:  Bayesian  zoom-in’s  based  on  the  BV  and  Sobolev  image  models:  TV  inpaint¬ 
ing  yields  much  sharper  boundaries  [CSOla]  (test  image  from  Caltech’s  computational 
vision  group). 


The  Mumford-Shah  image  model,  once  computationally  realized  (such  as  by  the 
level-set  method  of  Osher  and  Sethian  [OS88],  as  recently  studied  by  Tsai,  Yezzi,  and 
Willsky  [TAYW01],  and  Chan  and  Vese  [CVOO]),  is  very  powerful  for  image  denoising 
and  segmentation.  Novel  applications  to  the  inpainting  problem  have  been  studied  re¬ 
cently  by  Chan  and  Shen  [CSOla],  Tsai,  Yezzi,  and  Willsky  [TAYW01],  and  Esedoglu 
and  Shen  [ES02], 

5.2  Gousseau  and  Morel:  Natural  images  are  NOT  BV 

This  remarkable  recent  result  of  Gousseau  and  Morel  [GMO 1  ]  is  the  fruit  of  a  successful 
combination  of  statistical  image  study  and  mathematical  analysis  of  image  models. 

The  key  is  the  following  lower  bound  for  the  TV  norm  by  the  so  called  sectional 
density  fu(h ,  s ): 

r\n\ 

TV(w)>7T i  /  s^fu(h,s)ds, 

Jo 

where  |fl|  denotes  the  Lebesgue  measure  (or  area)  of  the  image  domain,  s  an  “area” 
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The  original  image 


Edge  tube  from  Canny's  detector 


The  initial  guess 


The  TV  inpainting 


Figure  4:  Perceptual  image  coding  and  decoding  by  the  TV  inpainting 
model  [CS01a](test  image  from  Caltech’s  computational  vision  group). 


parameter,  and  h  a  quantization  level  of  the  image.  Roughly  speaking,  for  a  given 
image  u  and  a  quantization  level  h,  j „  { h.  s)ds  denotes  the  number  of  disjoint  /i-“blobs” 
(or  /i-sections)  whose  areas  fall  within  [s,  s  +  ds\  (see  [GM01]  for  more  details).  The 
empirical  statistics  based  on  many  natural  images,  obtained  by  the  same  school  of 
authors  [AGM99],  reveals  the  following  power  law: 

,  , ,  ,  const. 

fu(h,  s)  =  ,  with  a  «  2, 

for  any  generic  and  homogeneous  natural  image  u.  Therefore, 

r\Q\  i 

TV(w)  >  const.  /  - 7777  ds, 

Jo  sa~1/2 

which  diverges  at  s  =  0  for  all  a  >  3/2. 

In  Meyer’s  u  +  v  language  [MeyOl],  the  result  reveals  that  the  ^-component  for 
generic  natural  images  contains  too  many  small  scale  “blobs”  (clustering  controlled  by 
a  quantization  level  h ),  which  makes  generally  u  +  v  /  BV(fl). 

Therefore,  this  negative  assertion  still  does  not  cloud  the  positive  role  of  Rudin- 
Osher-Fatemi’s  BY  images  in  the  successful  modeling  of  the  u  component. 
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5.3  Conclusion 

A  generic  image  seems  to  be  the  composition  of  two  components:  u+v,  as  Meyer  [MeyOl] 
puts  it  recently.  Roughly  speaking,  u  is  the  deterministic  component,  and  v  is  the  “tex¬ 
ture”  or  “clutter”  component  [MG01],  characterized  by  rapid  oscillations  but  still  away 
from  being  white  noise.  The  (.’-component  carries  a  delicate  correlation  between  space 
and  spatial  frequencies,  and  as  a  result,  statistical,  spectral,  and  wavelets  tools  are  ideal. 
The  (/-component  is  more  geometric,  and  embedded  with  the  crucial  information  of  de¬ 
terministic  and  large-scale  features  such  as  edges,  corners,  and  T-junctions.  The  BV 
image  model  of  Rudin,  Osher  and  Fatemi  is  one  of  the  very  few  successful  models 
for  the  (/-component,  which  are  both  theoretically  accessible  and  computationally  ef¬ 
ficient.  For  images  with  low  textures  (i.e.,  the  ergodic  variance  Var(u)  1),  such 
as  many  indoor  scenes  capturing  large  objects,  the  BV  image  model  by  itself  is  often 
sufficient  for  the  tasks  like  denoising,  deblurring,  and  inpainting.  This  viewpoint  has 
been  strongly  supported  by  various  computational  results. 
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